A lot of people think open models are bad at coding. They’re usually wrong.
What’s actually bad most of the time is the coding agent harness.
That distinction matters more than most developers realize.
Because coding performance today is no longer just about:
- raw model intelligence
- benchmark scores
- parameter count
Increasingly, performance depends on:
- orchestration
- tool execution
- context handling
- memory
- retries
- provider routing
- caching
- runtime design
That layer is called the harness.
And honestly:
Most coding agents still treat it as an afterthought.
What Is a Coding Agent Harness?
A harness is the runtime system around the model.
The model itself only generates tokens.
The harness decides:
- which tools the model can access
- how tool calls are validated
- how memory is managed
- how context windows are handled
- how retries work
- how providers are routed
- how errors get repaired
- how model switching works
- how parallel execution behaves
In other words:
The harness is the operating system for the agent.
Two coding agents can use the exact same model and produce dramatically different results depending on how the harness is engineered.
That’s why:
- the same model can feel “smart” in one coding agent
- and broken in another
Why Generic Coding Agents Struggle With Open Models
Most coding agents were designed around closed-model assumptions.
- Stable APIs.
- Stable schemas.
- Stable caching.
- Stable tool behavior.
Open-model ecosystems don’t behave like that.
You have:
- multiple gateways
- provider-specific quirks
- inconsistent tool formatting
- varying context windows
- different reasoning formats
- fragmented caching behavior
That variance breaks generic coding agents surprisingly fast.
This is why many developers assume:
“Open models are bad at coding.”
But increasingly:
the harness simply wasn’t engineered properly for open models.
Closed providers like OpenAI and Anthropic hide enormous amounts of runtime complexity:
- integrated caching
- standardized APIs
- stable model IDs
- predictable tool behavior
- optimized infrastructure
Open-model ecosystems expose all of that complexity directly.
That means coding agents need to absorb:
- provider variance
- routing differences
- schema inconsistencies
- cache fragmentation
- gateway quirks
- context variability
If the harness cannot absorb that variance cleanly, the model appears worse than it actually is.
The Biggest Open-Model Problem Is Runtime Variance
One of the biggest lessons in building coding agents is that almost every runtime assumption eventually breaks.
At first, a coding agent might work perfectly with something simple like:
1const contextLimit = 200_000Easy.
Until users start switching between:
- 1M-token models
- 128k-token models
- multiple providers
- different gateways
Suddenly context windows stop being constants. They become runtime variables.
And once that happens:
- auto-compaction breaks
- token gauges become inaccurate
- overflow guards fail
- summaries compact too early
- retries become unreliable
The challenge stops being:
“How do we make the model smarter?”
And becomes:
“How do we make the runtime adaptive?”
That’s harness engineering.
Why Mid-Conversation Model Switching Is Hard
Modern coding agents increasingly allow users to switch models mid-session.
That sounds simple.
It isn’t.
Imagine this scenario:
- User is at 600k tokens
- Running a 1M context model
- Switches to a 200k model
You can’t just update the UI and continue.
The next request would immediately fail.
The runtime has to:
- recompute limits
- recalculate token budgets
- compact conversations safely
- preserve important context
- leave room for future output
- avoid destroying conversational continuity
This is runtime orchestration.
The model itself has nothing to do with this problem.
Why Open Models Sometimes Feel Slow
Another common misconception:
“Open models are slower.”
Not necessarily.
A lot of open-model latency comes from cache behavior.
Coding agents repeatedly send:
- the same system prompt
- the same tool definitions
- append-only conversations
That should be fast.
But many open-model inference systems distribute requests across different GPU nodes.
When requests land on different nodes:
- prefix caches disappear
- prompts re-prefill from scratch
- latency spikes dramatically
The model didn’t suddenly become slower.
The runtime simply lost cache locality.
Closed providers often hide this problem internally through:
- infrastructure-level caching
- stable routing
- integrated orchestration
Open-model systems expose it directly.
Tool Calling Is Mostly a Contract Problem
Another thing many developers misunderstand:
Most tool-calling failures are not intelligence failures.
They’re contract mismatches.
Across models like:
- DeepSeek
- Qwen
- GLM
- Kimi
the same problems repeat constantly:
- passing
nullinstead of omitting fields - emitting arrays as JSON strings
- wrapping values incorrectly
- mismatching expected containers
These failures are usually deterministic.
Not random hallucinations.
The fix often isn’t:
“Use a smarter model.”
It’s:
“Build a better runtime contract layer.”
That means:
- schema-aware retries
- automatic repair systems
- validator-guided correction
- relational defaults
- transparent recovery feedback
The harness becomes responsible for mediating between:
- model behavior
- tool expectations
And that layer dramatically changes real-world coding quality.
The Biggest Open-Model Problem Is Identity
Another surprisingly difficult problem:
Model identity.
Different providers expose the same model using completely different names.
For example:
1moonshotai/Kimi-K2-Instruct
2moonshot/kimi-k2-6
3@moonshot/kimi-k2-6All technically the same model.
But different providers require different formats.
If the runtime treats model identity as raw string equality:
- caching breaks
- telemetry fragments
- fallbacks fail
- evals become inaccurate
- routing becomes inconsistent
The solution is canonicalization.
Internally, the runtime should treat:
1kimi-k2-6as the canonical identity everywhere.
Provider-specific translation only happens at the final SDK boundary.
That single abstraction fixes:
- routing consistency
- cache stability
- fallback behavior
- evaluation accuracy
- telemetry correctness
Small runtime abstractions become extremely load-bearing in coding agents.
How Command Code Approaches Harness Engineering
This is where harness engineering becomes practical instead of theoretical.
Most coding agents were originally optimized around:
- Claude
- GPT
- tightly controlled APIs
- stable tool contracts
Open models introduce a very different environment:
- inconsistent provider formats
- fragmented caching behavior
- varying context windows
- schema mismatches
- provider-specific quirks
Generic coding agents often expose that complexity directly to users.
Command Code was designed specifically to absorb that variance at the harness layer instead.
That includes:
- canonical model identity handling
- provider-aware routing
- aggressive context management
- automatic tool-input repair
- cache-aware session routing
- multi-provider fallback orchestration
- runtime compaction systems
- capability negotiation across gateways
The goal is simple:
Make open models feel production-ready.
Because increasingly, open-model performance is less about the weights themselves and more about whether the orchestration layer understands how to run them properly.
That’s why the same open model can:
- fail in one coding agent
- and perform near frontier closed models in another
The harness determines how much of the model’s actual capability survives runtime.





































































//Take Command of your code.
Ship 10x faster with the same team, less time, and your coding taste. Install, sign in, and start coding.
The Harness Is Becoming More Important Than the Model
This is the shift most people haven’t fully realized yet.
The biggest differentiator in coding agents may no longer be:
- raw intelligence
but:
- runtime architecture
The harness determines:
- how much context survives
- how fast tools execute
- how reliable retries become
- how providers fallback
- how models recover from mistakes
- how orchestration behaves across long sessions
That’s why the same model can:
- fail completely in one coding agent
- outperform frontier closed models in another
Increasingly:
orchestration quality becomes model quality.
Why Harness Engineering Matters
Harness engineering is becoming infrastructure engineering for AI systems.
As coding agents evolve, the runtime matters just as much as the model itself.
The future winners probably won’t just be:
- companies with the smartest models
But:
- companies with the best orchestration layers
Because once intelligence becomes cheap and abundant:
coordination becomes the moat.
Final Thought
A lot of AI discourse still treats coding performance like a benchmark problem.
But real-world coding agents are runtime systems.
And runtime systems fail in subtle ways:
- cache invalidation
- provider mismatches
- context compaction
- schema drift
- retry logic
- tool orchestration
- concurrency bugs
That’s harness engineering.
And increasingly:
Open models aren’t losing because they’re weak.
They’re losing because most coding agents were never engineered properly for them in the first place.
Try Open Models in Command Code
1npm i -g command-codeSign up for Command Code. Install it, run cmd, write some code using the open models.
